Mitigating the Curse of Dimensionality for Exact kNN Retrieval

نویسندگان

Michael A. Schuh

Tim Wylie

Rafal A. Angryk

چکیده

Efficient data indexing and exact k-nearest-neighbor (kNN) retrieval are still challenging tasks in high-dimensional spaces. This work highlights the difficulties of indexing in high-dimensional and tightly-clustered dataspaces by exploring several important tunable parameters for optimizing kNN query performance using the iDistance and iDStar algorithms. We experiment on real and synthetic datasets of varying size, cluster density, and dimensionality, and compare performance primarily through filter-and-refine efficiency and execution time. Results show great variability over parameter values and provide new insights and justifications in support of prior best-use practices. Local segmentation with iDStar consistently outperforms iDistance in any clustered space below 256 dimensions, setting a new benchmark for efficient and exact kNN retrieval in highdimensional spaces. We propose several directions of future work to further increase performance in high-dimensional real-world settings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimizing the Number of Keypoint Matching Queries for Object Retrieval

To increase the efficiency of interest-point based object retrieval, researchers have put remarkable research efforts into improving the efficiency of kNN-based feature matching, pursuing to match thousands of features against a database within fractions of a second. However, due to the high-dimensional nature of image features that reduces the effectivity of index structures (curse of dimensio...

متن کامل

The Role of Hubs in Cross-Lingual Supervised Document Retrieval

Information retrieval in multi-lingual document repositories is of high importance in modern text mining applications. Analyzing textual data is, however, not without associated difficulties. Regardless of the particular choice of feature representation, textual data is high-dimensional in its nature and all inference is bound to be somewhat affected by the well known curse of dimensionality. I...

متن کامل

An Appraise of KNN to the Perfection

K-Nearest Neighbor (KNN) is highly efficient classification algorithm due to its key features like: very easy to use, requires low training time, robust to noisy training data, easy to implement. However, it also has some shortcomings like high computational complexity, large memory requirement for large training datasets, curse of dimensionality and equal weights given to all attributes. Many ...

متن کامل

LSI vs. Wordnet Ontology in Dimension Reduction for Information Retrieval

In the area of information retrieval, the dimension of document vectors plays an important role. Firstly, with higher dimensions index structures suffer the “curse of dimensionality” and their efficiency rapidly decreases. Secondly, we may not use exact words when looking for a document, thus we miss some relevant documents. LSI (Latent Semantic Indexing) is a numerical method, which discovers ...

متن کامل

Minimizing the Number of Matching Queries for Object Retrieval

To increase the computational efficiency of interestpoint based object retrieval, researchers have put remarkable research efforts into improving the efficiency of kNN-based feature matching, pursuing to match thousands of features against a database within fractions of a second. However, due to the highdimensional nature of image features that reduces the effectivity of index structures (curse...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Mitigating the Curse of Dimensionality for Exact kNN Retrieval

نویسندگان

چکیده

منابع مشابه

Minimizing the Number of Keypoint Matching Queries for Object Retrieval

The Role of Hubs in Cross-Lingual Supervised Document Retrieval

An Appraise of KNN to the Perfection

LSI vs. Wordnet Ontology in Dimension Reduction for Information Retrieval

Minimizing the Number of Matching Queries for Object Retrieval

عنوان ژورنال:

اشتراک گذاری